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^^^ver the last decade, there has been increasing 
attention focused on the inadequacy of the current 
methodology employed in randomized clinical trials 
involving new antidepressant medications. The primary 
focus of this concern has centered on the need to ade- 
quately differentiate the effectiveness of new treatments 
from the placebo condition. There has been considerable 
consternation because of the increasing rate of placebo 
response seen in all types of trials in psychiatry, particu- 
larly trials of mood and anxiety disorders. 13 This growing 
awareness has led to a variety of different efforts that have 
begun to address concerns about trial design and method- 
ology. 46 These include an ongoing series of workshops 
sponsored by the National Institute of Mental Health 
(NIMH) and the New Clinical Drug Evaluation Unit 
(NCDEU). 7 The NIMH has also hosted a series of con- 
sensus conferences over the last few years in an attempt to 
begin to focus attention on these concerns. Such confer- 
ences have investigated issues including placebo and 
placebo response and the development of new instru- 
ments for the assessment of mood and anxiety disorders. 
There has also been a series of international meetings, 
including a symposium held in Rhodes, Greece in 2000, 
which brought together international experts in method- 
ology with senior staff from the NIMH and the Food and 
Drug Administration (FDA). The culmination of these 
concerted efforts was a consensus statement that was pub- 
lished in Neuropsychopharmacology in 2002. 8 The Rhodes 
panel identified 4 critical problem areas: (i) the nature of 
the patient sample; (ii) the limitations of behavioral meth- 
ods and analyses used for assessing treatment-related 
improvement and recovery; (iii) the lack of consensus 
about standards for determining speed of onset and action 
for medications; and (iv) the failure to integrate advances 
into our knowledge about depression in antidepressant 



This paper reviews some of the challenges faced by indi- 
viduals who design and implement clinical trials of poten- 
tial antidepressant medications. Particular emphasis is 
placed on questioning the validity of some of the theo- 
retical assumptions that form the underpinnings of most 
conventional trials. Work from our group developing clin- 
ical trial methodology for minor depression is used as an 
example of how alternate constructs may be helpful to 
differentiate drug-placebo differences. 
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development with current clinical trial design. The topics 
requiring greater emphasis include concerns about the 
validity of our current diagnostic nosology, as well as ques- 
tions about how diagnoses are made. There are also ques- 
tions about the best way to assess the severity of psychi- 
atric syndromes. Our current standard is to use 
psychometric rating scales. However, many times these 
scales only reflect one dimension of a complex illness. 
Another critical issue is the number, as well as the length, 
of the evaluations to be performed. A related issue of con- 
cern is the total length of time that is given to the evalua- 
tion of the active treatments. One of the major recurrent 
challenges faced in medication development is ensuring 
that the trials are adequately powered in order to differ- 
entiate relatively subtle differences. Very often power cal- 
culations are not based on empirical data, but rather 
reflect the aspirations of the trial design planners. 
Assumptions made about the sample for the study often 
end up greatly influencing the trial design. These assump- 
tions are made in order to facilitate the use of relatively 
simple inferential statistical models. However, some of 
these assumptions reflect lack of thought about the psy- 
chiatric syndromes. One of the intrinsic assumptions made 
in the design of trials is that the sample being analyzed will 
be relatively homogeneous. We frequently attempt to con- 
trol for age, ethnicity, length of illness, comorbid diagnosis, 
and comorbid medical factors. We frequently do not go to 
great lengths to determine that the subjects being evalu- 
ated truly have a similar disorder. Yet, any clinician will 
readily attest that patients with depression in clinical prac- 
tice clearly respond differently to the same medication 
and, in some cases, do not respond at all. 914 This suggests 
that there is considerable heterogeneity within the group 
of individuals who have major depressive disorder. 
Furthermore, clinicians can certainly confirm that the same 
medication given to different individuals may produce 
very different side-effect profiles for each of those indi- 
viduals. Even simple clinical observation suggests that we 
are dealing with a heterogeneous syndrome when we dis- 
cuss major depressive disorder. An overview of any large 
clinical trial's database will demonstrate that improvement 
is not uniform for subjects receiving an active, effective 
treatment. Some individuals get markedly better, while 
many individuals do not improve at all during a standard 
antidepressant trial. 

The representativeness of the sample poses another con- 
cern. After the advent of the Diagnostic and Statistical 
Manual of Mental Disorders, Revised Third Edition, the 



concept of comorbidity was given much greater weight. 
Prior to that, a hierarchical approach to diagnosis was used. 
The emphasis on the presence of comorbid disorders led 
to the development of rigorous inclusion and exclusion cri- 
teria for most studies. Although there is little empirical evi- 
dence that supports the use of most of these inclusion and 
exclusion criteria, they have become standardized, and in 
many cases, quite limiting. However, it should be noted that 
many of these criteria seem to be developed as part of a 
response to perceived expectations by regulatory agencies 
such as the FDA and the European regulatory authorities. 
Nevertheless, these criteria end up limiting the represen- 
tativeness of the sample being investigated. The majority 
of individuals suffering from the syndrome are excluded 
from participation in these trials. Therefore, we have lim- 
ited information about the generalizability of either posi- 
tive or negative results to the syndrome in general. 
A factor that is rarely discussed is the lack of stability 
inherent in most of these syndromes. Most clinical trials use 
one rating scale as a primary measure of success. Therefore, 
the trial measures only a limited aspect of that syndrome. 
A second assumption that is made in the design of the trial 
and the treatment of the disorder is that the disorder itself 
will be relatively stable if no intervention is made. 
Unfortunately, this is a fallacious assumption. Some indi- 
viduals demonstrate significant week to week variation in 
ratings measures, independent of any type of treatment 
intervention. This intrinsic fluctuation associated with the 
disorder makes it difficult to discern what degree of change 
can be attributed to either the placebo condition or the 
active treatment condition. This has led to a reductionistic 
approach to analysis of subjects and trials, where all 
responses and changes are essentially attributed to either 
the active treatment condition or the placebo condition. 

Lessons learned from clinical trials 
investigating minor depressive disorder 

One can use randomized clinical trials in minor depressive 
disorder as a case study to emphasize some of the chal- 
lenges faced in trial design and possibly some solutions to 
these challenges. Minor depressive disorder is an area 
where there is no consensus about its conceptualization or 
definition. Some individuals believe that minor depression 
is merely a segue into major depressive disorder, while 
others consider minor depression an entity in itself. 1517 
Some individuals worry that investigating minor depres- 
sion trivializes the core concept of major depressive dis- 



403 



Clinical research 



order, while others consider it an important part of the 
spectrum of depressive syndromes. 18 Even among those 
who believe that minor depression is a valid concept that 
requires rigorous investigation, there is considerable 
debate about what the definition of minor depression is or 
should be." Furthermore, there is little empirical evidence 
to support any of the currently employed definitions. 
Many of the older clinical trials investigating minor 
depression actually grouped patients into cohorts that con- 
tained individuals with major depressive disorder 
described as being mild in severity. Some of these trials did 
not differentiate between major depressive disorder and 
a diagnosis of minor depression, but merely stated that 
those with lower Hamilton Depression Rating Scale 
(HAMD) scores should be considered as having minor 
depression. Other trials combined patients with major 
depression of a milder form with Research Diagnostic 
Criteria (RDC) patients with minor depression. Older tri- 
als employed either tricyclic antidepressant medications 
or antipsychotic medications. It is not surprising, based on 
the side-effect profiles of these agents and the weighting 
of the HAMD towards somatic concerns, that it was diffi- 
cult to differentiate an active treatment response from a 
placebo response. A second challenge that studies of minor 
depression emphasize is the use of rating scales that were 
developed at another time and for another diagnostic 
entity to assess minor depression. All of the older studies 
used the HAMD 17 as a primary outcome measure. 20 As 
discussed above, this rating scale, developed to assess inpa- 
tients with endogenous depression, is heavily weighted 
toward somatic and/ or vegetative factors. This makes the 
HAMD a very coarse instrument to use for individuals 
with milder forms of depression or minor depression, since 
neither somatic nor vegetative symptoms are highly 
prominent in such patients. Furthermore, these less highly 
prominent symptoms tend to be transient in presentation 
and thus may vary greatly from week to week on a rating 
scale. This emphasizes the importance of carefully ensur- 
ing that the methods of assessment fit the most relevant 
signs of the syndrome being studied. Very often, both gov- 
ernment and industry have been willing to commit a large 
percentage of limited resources to clinical trial research, 
without considering the appropriateness of the measures 
in assessing the full scope of the syndrome. 
Another concern highlighted by investigations of minor 
depression is the lack of objective measures of either func- 
tional or quality of life impairment. This problem is also 
true for most studies of most psychiatric disorders. Thus, in 



spite of the fact that the Diagnostic and Statistical Manual 
of Men tal Disorders, Fourth Edition (DSM-IV) requires 
functional impairment or quality of life impairment to be 
present in order for a diagnosis of the syndrome to be 
made, there have been few efforts to establish some type 
of criteria for quality of life or functional impairment with 
these disorders. 21 It has been shown in primary care stud- 
ies that many people who seem to meet criteria for psy- 
chiatric syndromes have spontaneous remissions when fol- 
lowed longitudinally. This may well reflect the inclusion of 
individuals who, because of life stress, have a particular 
series of signs and symptoms, but in actual fact do not have 
the pathology associated with a lifelong syndrome. As 
would be expected, the result of not paying attention to 
these challenges when designing clinical trials is that the 
trials tend to be uninformative, if not misleading. 
In contrast to some of the problems identified above, a 
consortium of investigators at the University of California, 
San Diego, the University of Texas Southwestern, Western 
Psychiatric Institute and Clinics, and Eli Lilly conducted a 
multisite trial of minor depressive disorder (Judd et al, 
manuscript submitted). In order to deal with the concerns 
about the diagnosis of minor depression, the following cri- 
teria were used to operationalize our definition: (i) a sub- 
ject had to have dysphoria and anhedonia plus at least one 
additional symptom of major depressive disorder from a 
DSM-IV checklist, or dysphoria or anhedonia and two 
additional symptoms of major depressive disorder; (ii) a 
clear-cut functional disability as evidenced by a Global 
Assessment of Functioning (GAF) score of less than 70 
and Medical Outcome Survey (MOS) subscale score of 
less than 75 for social functioning, and of less than 67 for 
emotional role functioning. 22 - 23 In developing these crite- 
ria, we recognized that they were rather arbitrary and thus 
felt it was necessary to be rigorous and precise with our 
definition of what the syndrome was. We deliberately 
decided to include individuals with a past history of major 
depressive disorder or dysthymia, as long as they had been 
in remission for at least 2 years prior to developing their 
current episode of minor depression. Furthermore, we 
required individuals to have had minor depression for a 
minimum of 1 month prior to entering the trial. We delib- 
erately did not use a longer period than 1 month, since it 
is difficult to gather accurate retrospective information 
about the presence of minor symptoms. However, in order 
to compensate for concern that our definition of minor 
depression was merely a way station for individuals going 
into major depressive disorder or recovering from major 
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depressive disorder and becoming euthymic, we included 
a 4-week, single-blind, placebo lead-in phase to the study. 
This caused a second dilemma: which individuals would 
we exclude from the study? Since all of the rating scales 
we employed were validated for major depressive disor- 
der and not minor depression, it was difficult to know how 
individuals would score on these measures. Furthermore, 
we did not have any data to suggest to us what the range 
of symptomatology should be for individuals with minor 
depressive disorder on these rating measures. Therefore, 
we decided to require that individuals meet the same 
entrance requirements for 3 out of 4 weeks in the placebo 
phase, including the last 2 weeks prior to randomization, 
in order to enter the study. This facilitated getting an accu- 
rate sense of the types of changes one would see on the 
ratings scales, independent of their having a bearing on 
whether or not individuals were able to enter the double- 
blind portion of the study. 

Some of the other important features of this trial included 
the use of the Inventory of Depressive Symptomatology — 
Clinician Rated (IDS-C) as well as three different forms 
of the HAMD (17-, 21-, and 28-item), the Hamilton 
Anxiety Rating Scale, the G AF, and the MOS Short Form 
(SF-36). 2022 The IDS-C was identified as the primary out- 
come measure because of the unique features of the scale. 
First, this scale encompasses a much broader range of 
depressive symptomatology, extending from various psy- 
chological symptoms through somatic symptoms. Second, 
this symptom scale attempts to quantify, in a uniform way, 
both the severity and the intensity of symptomatology. Yet, 
we were not comfortable merely using existing rating 
scales as a way of assessing response in this trial. Therefore, 
we also investigated the effects of active treatment on 
complete resolution of symptoms of depression, plus res- 
olution of functioning. This is a remarkably high bar to 
attempt to overcome. 

Another crucial feature, as described earlier in this arti- 
cle, is the length of the evaluation. Many early random- 
ized clinical trials were 2- to 4-week placebo-controlled 
trials. 23 Over time, trials have extended to 6 to 8 weeks' 
duration. Yet, as is clearly emphasized by the work of 
Stassen and colleagues and others, many individuals with 
major depressive disorder are just beginning to reach 
recovery at the 8- to 12-week time points. 24 Therefore, we 
elected a 12-week acute trial particularly because we 
were interested in determining the number of individu- 
als that met remission criteria, as well as a change in rat- 
ing scale. The primary input, again, was the change in the 



IDS-C with the major outcome point being the ability to 
achieve complete remission for 1 month prior to the end 
of the trial. 

Since we were dealing with minor depression, it created a 
series of opportunities that we felt we had to explore in 
order to gather pilot data if further investigations were 
warranted. One of the other major questions was: what 
happens if one allows individuals to undergo an extended 
period of time on placebo (ie, 4 months)? Will this impact 
response to pharmacotherapy? A second question was: is 
acute treatment of minor depression sufficient? Will indi- 
viduals who respond acutely require continuation treat- 
ment, as is the case with major depressive disorder? 
Additionally, what is the course of untreated minor 
depression for individuals who participate in a trial? Are 
we placing these people at an increased risk or burden by 
their continued presence in the trial while on placebo? In 
order to gather pilot data to begin to answer these ques- 
tions, individuals who completed the initial 12 weeks of the 
trial entered a continuation phase. The randomization of 
individuals for the acute and continuation phase of the 
trial were performed at the initial point of randomization, 
rather than a second re-randomization, after completion 
of the acute trial. Therefore, individuals in this trial were 
randomized both to an acute phase and maintenance 
treatment with either fluoxetine or placebo and to one of 
four continuation phase conditions: fluoxetine -fluoxetine, 
fluoxetine-placebo, placebo-placebo, or placebo-fluoxe- 
tine. Analysis of the continuation phase of the study was a 
priori specified to be exploratory, because we knew that 
sizes of the cells would not be sufficient to answer these 
questions. 

There were several features during the analysis plan that 
were unique. First was the realization that minor depres- 
sion was most likely a heterogeneous syndrome. 
Therefore, we acknowledged the need to investigate the 
relationship between minor depression and a previous his- 
tory of major depressive disorder and dysthymia, and also 
the relationship between minor depression and a family 
history of psychiatric disorders. In an attempt to more 
thoroughly utilize the data that would be gathered in this 
study, we decided that a mixed regression model would be 
more powerful than a standard analysis of variance of sta- 
tistical approach. However, since the random regression 
model is not as accepted in psychiatric literature, we spec- 
ified in the initial data analysis plan that both types of 
analyses be performed. A third aspect of this study was the 
evaluation of the categorical end point (ie, full remission 
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of symptoms and return of functioning), as well as the 
parametric end points. 

One can use the design of this trial in minor depression to 
address a number of the challenges that we had earlier 
identified. This trial is a good example of the type of con- 
sensus thinking process that can be used to enhance 
diagnostic rigor and assessment of severity of illness. 
Furthermore, it highlights the need to carefully review the 
items within assessment instruments, in order to ensure 
that the instruments are as useful as possible for the dis- 
order. This study also highlights the type of thought that 
should go into determining the evaluation for a study. 
Since minor depression was an unknown entity at the 
time, one of the key questions was: how stable is this 
entity over time? Although 4 weeks is an arbitrary length 
of time, we felt for ethical and scientific reasons, that it 
would be an adequate length of time for an extended 
placebo run-in. 

Conclusions 

A careful and critical review of clinical trial methodolo- 
gies is imperative for the field to move forward. Attention 
to many of the assumptions that are inclusively made 
when a trial is designed will be critical in enhancing the 
success of clinical trials. We must think closely about the 
diagnostic criteria used in the trial in the inclusion and 
exclusion criteria. Many of the currently accepted criteria 
limit the generalizability of the findings and have not been 
demonstrated in a systematic fashion to enhance differ- 
entiation of drug versus placebo responses. Yet, some 
important aspects of the very definition of these syn- 
dromes have been neglected, in particular, the importance 
of including functional disability and quality of life dys- 
function as part of the definition of the syndrome. Some 
individuals may present with a requisite number of symp- 
toms, but may not be as adversely affected as if they had 



had a profound, long-lasting syndrome. It is quite likely 
they are suffering from a transient constellation of symp- 
toms due to an external stressor. A second important con- 
cern is the appropriateness of the assessments that are 
being used in randomized controlled trials. Very often, the 
assessments that are employed represent "me too" assess- 
ments, because studies done by other companies have 
used the measures in the past. Yet, this may not reflect our 
best knowledge about the disorder being studied, nor a 
sufficient way of bringing a new compound onto the mar- 
ket. Frequently, the argument for the use of such instru- 
ments is that they are supposedly mandated by regulatory 
agencies. However, more often than not, this is a myth 
that is perpetuated rather than the outcome of frank and 
careful discussions with the regulatory authority. A third 
important issue that requires some thought is assumptions 
about the stability of the syndrome over time. Many times, 
studies are designed with the assumption that random- 
ization to placebo should lead to a relatively static or, if 
anything, disadvantageous course for patients. Yet, inves- 
tigation of most medical syndromes suggests that there is 
an intrinsic waxing and waning to the course of the syn- 
drome. Therefore, arbitrary assessment using instruments 
that investigate only one aspect of the syndrome may well 
lead to spurious results. 

A last concern, but one that can greatly influence a trial, 
involves appropriate statistical design. Often studies are 
powered based on desire, rather than available data. A sec- 
ond concern is that often a new statistical design repre- 
sents the easiest or safest design, rather than a design that 
is most likely to produce informative results. 
In conclusion, it is clear that there is tremendous oppor- 
tunity to improve the design and methodology used in ran- 
domized clinical trials. The recognition of these challenges 
by the NIMH, the FDA, the European regulatory author- 
ities, as well as industry, implies that important future 
change is likely to occur. □ 
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Desafios en el desarrollo de ensayos dinicos 
para el trastorno depresivo mayor: lecciones 
aprendidas de los ensayos en la depresion 
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Defis poses par le developpement des 
essais diniques sur les troubles depressifs 
majeurs : leqons tirees des essais sur les 
depressions legeres 
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diniques pour evaluer I'efficacite de nouveaux 
medicaments antidepresseurs. II a ete particuliere- 
ment insiste sur la mise en question de la validite 
de quelques-unes des hypotheses theoriques qui 
etayent la plupart des essais classiques. Le travail de 
notre groupe qui developpe une methodologie 
d'essai clinique pour les depressions legeres illustre 
comment /'utilisation de concepts differents peut 
permettre de distinguer les differences medica- 
ment-placebo. 
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